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Abstract 

We provide a scheme for inferring causal re¬ 
lations from uncontrolled statistical data based 
on tools from computational algebraic geome¬ 
try, in particular, the computation of Groebner 
bases. We focus on causal structures contain¬ 
ing just two observed variables, each of which 
is binary. We consider the consequences of im¬ 
posing different restrictions on the number and 
cardinality of latent variables and of assuming 
different functional dependences of the observed 
variables on the latent ones (in particular, the 
noise need not be additive). We provide an in¬ 
ductive scheme for classifying functional causal 
structures into distinct observational equivalence 
classes. For each observational equivalence 
class, we provide a procedure for deriving con¬ 
straints on the joint distribution that are neces¬ 
sary and sufficient conditions for it to arise from 
a model in that class. We also demonstrate how 
this sort of approach provides a means of deter¬ 
mining which causal parameters are identifiable 
and how to solve for these. Prospects for ex¬ 
panding the scope of our scheme, in particular 
to the problem of quantum causal inference, are 
also discussed. 


1 Introduction 

Causal relationships, unlike statistical dependences, sup¬ 
port inferences about the effects of interventions and the 
truths of counterfactuals. While a randomised controlled 
experiment can be used to determine causal relationships, 
these may not be available for various reasons: they could 
be restrictively expensive, technologically infeasible, un¬ 
ethical (e.g., assessing the effect of smoking on lung 
cancer), or indeed physically impossible (e.g., for vari¬ 
ables describing properties of distant astronomical bodies). 
Therefore, inferring causal relationships from uncontrolled 
statistical data is an important problem, with broad appli¬ 
cability across scientific disciplines. Over the past-twenty 
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five years, there has been much progress in developing 
methods to solve this problem d 12 EH 0. 

As has become standard practice, we formalize the notion 
of causal structure using directed acyclic graphs (DAGs) 
with random variables as nodes and arrows representing 
direct causal influence (DID- A more refined description 
of causal dependences specifies not only what causes what, 
but also, for every variable, its functional dependence on 
its causal parents. We shall use the term functional causal 
structure to refer to the specification of the set of functions, 
which includes a specification of the DAG. As is standard, 
the variables that are not observed are termed latent , and 
the DAG does not include any latent variables that act as 
causal mediaries, so that all the latent variables are par¬ 
entless. We shall use the term causal model to describe 
the functional causal structure together with a specifica¬ 
tion, for each latent variable, of a probability distribution 
over its values. Each causal model associated to a given 
functional causal structure defines a possible joint proba¬ 
bility distribution over the observed variables. We are in¬ 
terested in the set of possible joint distributions over the 
observed variables for a given functional causal structure, 
that is, those that can arise from some set of distributions 
on the latent variables. We will say that two functional 
causal structures are observationally equivalent if they are 
characterized by the same set of distributions over the ob¬ 
served variables Q 

Many causal inference algorithms, such as those of H) 
and 0, only make use of conditional independence re¬ 
lations among the observed variables. If two causal struc¬ 
tures are such that the same set of conditional indepen¬ 
dence relations are faithful to them, then they are said to 
be Markov equivalent. Note that Markov equivalence can 
be decided purely on the basis of the DAG (i.e., the causal 
structure), while the notion of observational equivalence of 
interest here depends on the functional dependences (i.e., 
the functional causal structure ). In the case of just two ob¬ 
served variables, which is the one we consider here, the 
set of all causal structures are partitioned into just two 
Markov equivalence classes: those wherein the variables 
are causally connected, and those wherein they are not. 
As we show, however, the joint distribution over the ob¬ 
served variables supports many more inferences about the 


! This should not be confused with the notion of observational 
equivalence as applied to DAGs m 






functional causal structure, thereby providing a more fine¬ 
grained classification than is provided by Markov equiva¬ 
lence. 

In recent years, several methods have been suggested that 
make use not only of conditional independences, but also 
other properties of the joint statistical distribution between 
the observed variables GH3HHE3 (See also the works dis¬ 
cussed in Secs. |6.2| and |6.3| ). These newer methods also 
have limitations in the sense that they impose restrictions 
on the number of latent variables allowed in the underlying 
causal model and also on the mechanisms by which these 
latent variables influence the observed ones. 

In the present work, we restrict attention to the causal in¬ 
ference problem where there are just two observed vari¬ 
ables, each of which is binary (that is, discrete with just 
two possible values). We allow any functional causal struc¬ 
ture involving latent variables that are discrete (with a fi¬ 
nite number of values), and we impose no restriction on 
the number of latent variables or the mechanisms by which 
these influence the observed ones. 

We provide an inductive scheme for characterizing the ob¬ 
servational equivalence classes of functional causal struc¬ 
tures. This scheme has a few steps. First we show that, in 
each observational class, there is a functional causal struc¬ 
ture wherein all of the latent variables are binary. Restrict¬ 
ing ourselves to the latter sort of functional causal struc¬ 
ture, we show that one can inductively build up any func¬ 
tional causal structure from a pair of others having fewer 
latent variables. Thus, starting with functional causal 
structures with no latent variables, we can recursively build 
up all functional causal structures, and therefore all obser¬ 
vational equivalence classes of these, by applying our in¬ 
ductive scheme. 

Using this scheme, we catalogue all observational equiv¬ 
alence classes generated by functional causal structures 
with four or fewer binary latent variables. We have evi¬ 
dence, but no proof yet, that our catalogue is complete in 
the sense that a functional causal structure with any num¬ 
ber of binary latent variables—and hence, by the connec¬ 
tion described above, any functional causal structure with 
discrete latent variables—belongs to one of the classes we 
have identified. 

We also describe a procedure for deriving, for each class, 
the set of necessary and sufficient conditions on the joint 
distribution over observed variables for it to be possible to 
generate it from functional causal structures in this class. 
We call such a set of conditions a feasibility test for the 
class. The procedure for deriving these is as follows. We 
start with a particular functional causal structure within the 
class, express the parameters in the joint probability distri¬ 
bution over the observed variables in terms of the param¬ 
eters in the probability distributions over the latent vari¬ 
ables, then eliminate the latter using techniques from alge¬ 
braic geometry. 

Finally, we consider applications to the problem of identi¬ 
fying causal parameters. For the parameters describing the 
probability distributions over the latent variables, we note 
that our technique allows one to find expressions for these 
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(a) (b) 

Fig. 1: (a) DAG for causal model defined by A = B ® A 
and B = v (b) Joint distributions that can be generated by 
this causal model. 

in terms of the observational data for each observational 
equivalence classes that we have considered. For the pa¬ 
rameters describing the functional relations, we note that 
the limits to what one can infer about these, which may be 
different for different points in the space of possible joint 
distributions over the observed variables, can be inferred 
from our feasibility tests. 

2 Setting up the problem 

Consider the causal model of Fig[]Ja). From the DAG, it 
is clear that B is a cause of A, while A is noise local to A 
and v is noise local to B. The functional dependences are 
given by A = B ® A and B = v. A model with this sort of 
functional dependence is referred to as an additive noise 
model (ANM) in Refs. |30 t 5] [6). The values of A , for 
different values of B and A, are given in the table below. 


V 

A 

B 

A = B © A 

0 

0 

0 

0 

0 

1 

0 

1 

1 

0 

1 

1 

1 

1 

1 
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In Ref. 0 , it was shown that one can distinguish between 
the causal model of FigjTJa) and the causal models de¬ 
picted in Fig. |2ja) and Fig. [2jc), except for special cases 
of the distributions over the noise variables, such as, for 
instance, when A and v are uniformly distributed. Thus if 
we are promised that the causal model is an ANM, then 
(except for the special cases) we can distinguish between 
B causing A, A causing B and A and B being causally 
disconnected. To see how this works we will need to de¬ 
termine the correlations generated by this model. 

To describe the correlations we adopt the following nota- 
tional convention. 

P(A) = [x] means P (A = x) = 1 
P(A, B) = [x\ [y] = [xy\ means P (A = x,B = y) = 1 
P(A) = q[x\ means P(A = x) = q. 

Let qi be the probability that v = 0 and be the proba¬ 
bility that A = 0, then the correlations for the above causal 
model are 

P(A, B) = q\q 2 [00] + (1 - gi)(l - q 2 )[ 01] 

+ #i(l — ^ 2 )[10] + (1 ~ (Zl)(?2[ll]j 
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Fig. 2: (a) DAG for causal model defined by A = A and 
B = A ® v. (b) Joint distributions that can be generated 
by the model of (a). Note that this is a head-on view of a 
fan shape of the same type as is depicted in (d). (c) DAG 
for causal model defined by A = A and B = v. (d) Joint 
distributions that can be generated by the model of (c). 

This means P (A = 0, B = 0) = qiq2, P(A = 0, B = 
1) = (1 — qi)(l — ^ 2 ) an d so on. From now on, we will 
use the shorthand q { = 1 — qi to simplify expressions. 

Note that if a latent variable were to take one of its values 
with probability 1, then it would be trivial and could be 
eliminated from the functional causal structure. We there¬ 
fore consider only functional causal structures with non¬ 
trivial latent variables, that is, latent variables that have 
some statistical variation in their value, so that the prob¬ 
ability of any value is bounded away from 0 and 1. In the 
present example, therefore, 0 < qi, q^ < 1. 

For a general causal model we have P(A, B) = p 0 o[00] + 
Pm [01] +P 10 [10] +pn [11], where P(^4 = i,B = j) =Pij. 
We note that poo + P 01 + P 10 +P 11 w 1. As we only need 
three real parameters to specify P(A, B ), we can plot it in 
M 3 . It is easy to see that the points {P(A = i. B = j) = 
1 : E Z 2 } form the vertices of a tetrahedron in M 3 and 

so the plot of P( A, B) must lie within this tetrahedron. 

We can rewrite P(A, B) for our current example as 

¥(A,B) = Qi (q 2 [00] +<? 2 [10]) + q 1 (q 2 [ll} +<Z 2 [01]). 

So, if we fix the value of q^ in the range (0,1) and vary 
qi over the interval (0,1), the plot of P(A, B) consists of 
the line passing through a point on the edge of the tetra¬ 
hedron containing the vertices {[00], [10]} and a point on 
the edge containing the vertices {[11], [01]} (but exclud¬ 
ing these points). The full plot of P(A, B ), as q\ and q^ 
each range over the interval (0,1), is depicted in Fig. |TJb) 
(where the boundary points are excluded). We refer to this 
shape as a fan. Fig. [2^b) and Fig. [2jd) depict the set of 
joint distributions for the ANM where A causes B and the 
causal structure where A and B are causally disconnected. 

Given some joint distribution, P(A, B), how do we deter¬ 
mine if it lies on one of the fans of Fig. |TJb), Fig. [2jb) or 
Fig. [2jd)7 Recall that, because the latent variables are un¬ 


observed, we do not have access to the qf s directly, only 
the observed pif s. Thus, the problem can be posed as fol¬ 
lows: what are the defining equations of the fans in terms 
of the observed pij ’s? 

This problem was solved for the example of Fig. [T] in 
Ref. 0 using the following technique. First, it was noted 
that the DAG implies that A is marginally independent of 
B , and therefore P(A|F? = 0) = P(A|T> = 1). Given 
that A is a binary variable, this is true if and only if 
P(A = 1 |B = 0) = P(A = 1 |B = 1). We wish to 
eliminate A from this condition. Recall from the defini¬ 
tion of conditional probability that P(A = 1| B = b) = 
P(A = 1, B = b)/F(B = b). The functional dependence 
A = B 0 A can be used to conclude that P(A = 1 ,B = 
b) = ¥(A = b 0 1, B = b). Note that this last step is only 
possible because the noise is additive, so that one can in¬ 
fer A from A and B. Therefore, reverting to our notational 
conventions, where P {A = 1 ® 6, B = b) = Pi©&,6 and 
P (B = b) = Po,b + Pi,b > the condition becomes 

P10 = P01 

P00+P10 P11+P01' 

which can be rewritten as: 

P00P01 =PllPlO- 

This equation, together with the open-interval constraints , 
0 <Poo,Poi,Pio,Pn < 1, 

defines the fan in Fig. |TJb). Using similar techniques, one 
can show that Figs. 2(b) and 2(d) are defined by equation 

P00P10 = P11P01 3 

respectively 

P00P11 = PioPoi, 

together with the open-interval constraint. 

The question is: how can one find feasibility tests for 
generic causal models? In particular, how does one treat 
models where the noise is not additive? Consider, for 
instance, the causal model that has the same DAG as in 
Fig. |TJa), but where the noise is multiplicative, that is, 
A = B A. In this case, the value of A cannot be inferred 
from A and B (given that these could be zero), and conse¬ 
quently one cannot use the approach of Ref. 0. It is also 
unclear how one can characterize the possibilities for the 
joint distribution when the causal model involves an arbi¬ 
trary number of latent variables. We will show that these 
questions can be answered using powerful tools from alge¬ 
braic geometry, which we describe in the next section. 

3 Deriving the feasibility tests 

We begin with an introduction to some of the main con¬ 
cepts of algebraic geometry following the presentation 
given in 0. For a more detailed discussion, see ap¬ 
pendix [A| 

Denote the set of all polynomials in variables xi,... ,x n 
with coefficients in some field k by k[xi, ..., x n \. When 










dealing with polynomials, we are mainly interested in the 
solution set of systems of polynomial equations. This leads 
us to the main geometrical objects studied in algebraic ge¬ 
ometry, algebraic varieties and semi-algebraic sets. 

An algebraic varietj^ V(/i,...,/ s ) C k n is the 
solution set of the system of polynomial equations 
fi(xi,... ,x n ) = ••• = f s (xi,...,x n ) = 0. A basic 
semi-algebraic set is defined to be the solution set of a sys¬ 
tem of polynomial equalities and inequalities, that is, {x G 
R n : gi(x) ^ 0, Vi = 1 wherepi,..., g n G 

M[xi,..., x n \ are polynomials ove the real^] and where 
^ corresponds to either >, =, or <. Note that alge¬ 

braic varieties are examples of basic semi-algebraic sets. 
A semi-algebraic set is formed by taking finite combi¬ 
nations of unions, intersections, or complements of basic 
semi-algebraic sets. For instance, the fan in Fig. 1(b) is the 
semi-algebraic set that results from the intersection of the 
algebraic variety defined by the single polynomial equa¬ 
tion p 0 oPoi — PnPio = 0 and the set of inequalities that 
define the interior of the tetrahedral probability simplex 
(requiring each probability to be in the interval (0,1)). 

More generally, for any causal model, the set of possible 
joint distributions that can be generated by it are repre¬ 
sented by a semi-algebraic set. It follows that two causal 
models are observationally equivalent if and only if they 
generate the same semi-algebraic set. 

We now define ideals , the main algebraic object studied in 
algebraic geometry. A subset I C k[x \,..., x n \ is an ideal 
if it satisfies: (1) 0 G /, (2) If /, g G /, then / + p G /, and 
(3) If / G I and h G k[x i,..., x n \, then hf G /. 

A natural example of an ideal is the ideal generated by 
a finite number of polynomials, defined as follows. Let 
/i,..., f s be polynomials in k[xi ,..., x n ], then the ideal 
generated by f \,..., f s is: 

(/i, •••,/«> = {LA/i ■ hi,... ,h s G k[xi,..., x„] j. 
i=1 

The polynomials f ±,..., f s are called the basis of the 
ideal. 

Studying the relations between certain ideals and varieties 
forms one of the main areas of study in algebraic geometry. 
One can even define the algebraic variety V (/) defined by 
the ideal / C k[x i,..., x n \, where 

V(J) = {(a u ..., a n ) G k n : f(a u ..., a n ) = 0, V/ G /}. 

Interestingly, it can also be shown that if I = (/i,..., f s ), 
then V(/) = V(/i,..., f s ), which is to say that the vari¬ 
ety defined by a set of polynomials is the same as the va¬ 
riety defined by the ideal generated by those polynomials. 
Hence, varieties are determined by ideals. 

We can now use the language of algebraic geometry to re¬ 
state the question asked at the end of the last section. Let 

2 A1so called an affine variety or an algebraic set. 

3 Note that one can replace the real field R used in the last 
definition with any ordered field. 


V C k n be an algebraic variety given parametrically as 


pi =9i(qi,---,qm), 


(3.0.1) 


Pn — 9n{qii • • • 5 Qm) ■> 


where the gi are polynomials in gi,..., q m . The conjunc¬ 
tion of the above equalities with the inequalities ensur¬ 
ing that the variables gi,..., g m are in the interval (0,1) 
(probabilities bounded away from 0 and 1) defines a semi- 
algebraic set on pi 7 ... ,p n , gi,..., q m . We seek to infer 
which values of pi,..., p n are possible for some values 
of the gi,..., g m in their allowed intervals. By the Tarski- 
Seidenberg theorem CD, the solution to this problem is also 
a semi-algebraic set. We determine the latter as follows. 
First, we eliminate the variables gi,..., g m to find a sys¬ 
tem of polynomial equations in pi,... ,p n ,. These define 
the smallest algebraic variety on pi,..., p n , gi,..., g m 
that contains the semi-algebraic set that we seek to char¬ 
acterize. This problem is known as implicitization. The 
second step is to determine which points in this algebraic 
variety can be extended to a solution of the equalities and 
inequalities of the original parametric characterization. 

For example, consider the algebraic variety that is defined 
parametrically by the polynomial equations 


Poo = qiq 2 , P 10 = gig 2 , Poi = gig 2 , Pn = <h<l2- 


We would like to characterize the semi-algebraic set 
that this variety defines on the observed variables 
PooiPonPiOiPii alone when one eliminates the parame¬ 
ters gi and g 2 while enforcing that they are probabilities 
in (0,1). In Sec. [2] it was shown how one can do so, and 
that the resulting semi-algebraic set is the one depicted in 
Fig. 1(b). However, the technique was not generalizable to 
arbitrary functional causal structures. Here, we reconsider 
this example using techniques that are generally applica¬ 
ble. 


The problem can be solved by employing a specific choice 
of basis for the ideal generated by the system of polyno¬ 
mial equations that define the variety ( |3.0.1| ). The basis 
that achieves this feat is known as the Groebner basis. 


Groebner bases simplify many calculations in algebraic 
geometry and they have many interesting properties 10. 
There are efficient algorithms for calculating Groebner 
bases and many software packages that one can use to im¬ 
plement them. 

We discovered in this section that the fan of Fig. |TJb) is in 
fact the intersection of the algebraic variety defined by the 
ideal 


{poo - qiQ2,Pio - qiQ2,Poi - q^Pn - q\q 2 ) 

with the tetrahedron.The Groebner basis 0 of this ideal is 

4 with respect to the lex order gi > g 2 > poo > Pio > Poi > 
pn, see appendix |A| 





found to be 

9i = qi +P 01 + pn - i 

52 = <72 + Poi + Pio - 1 

53 = Poo + Poi + PlO + Pll - 1 

54 = Poi +P 01 P 10 + P 11 P 01 — Poi + P 10 P 1 U 
Solutions to gi = • • • = #4 = 0 provide solutions to 

poo = qiQ2, pio = qiq 2 , poi = qiq 2 , Pn = q 1^2 

which define our algebraic variety. Looking more closely 
at the Groebner basis we note that the variables q\ , q 2 have 
been eliminated from the polynomials #3 and g 4. The so¬ 
lution of p 3 = poo + Poi + Pio + P 11 ~ 1 = 0 is exactly 
the normalisation condition. The solution of #4 = 0 gives 
us the following 

Poi (pio + Poi + Pll - 1 ) + P 10 P 11 = 0 , 
which, using the normalization condition, then gives us 

P 00 P 01 = PloPll- 

OndemandingO < Poo,Poi,Pio,Pn < 1 and p rj e R, Vy 

(i.e. on taking the intersection of this algebraic variety with 
the tetrahedron), we obtain the semi-algebraic set corre¬ 
sponding to the fan of Fig. 1(b), which we derived in sec¬ 
tion [2] This is a special case of a general result, known 
as the elimination theorem , which provides us with a way 
of using Groebner bases to systematically eliminate certain 
variables from a system of polynomial equations and, thus, 
to solve the implicitization problem. 

The general procedure for finding the semi-algebraic set is 
as follows. First, given the system of polynomial equations 
defining the implicitization problem, as in Eq. ( |3.0.1| ), 
form the ideal generated by these polynomials and com- 
put£] its Groebner basis. The elements of this basis that 
do not contain the variables qi ,..., q m constitute con¬ 
straints on the variables pi ,..., p n alone. These con¬ 
straints consitute polynomial equalities, and therefore de¬ 
fine an algebraic variety on the variables pi,... ,p n . Sec¬ 
ond, we determine which points on this variety correspond 
to solutions of the original equalities and inequalities on 
gi,..., q n and pi,... ,p m . This will result in inequality 
constraints. The equality constraints from the first step and 
the inequality constraints from the second step together 
characterize the semi-algebraic set on pi,..., p n that is 
compatible with the given functional causal structure. We 
note that one trivial consequence of the fact that each of 
the gi, ..., q m is in the interval ( 0 ,1) is that each of the 
Pi,... ,p n is in the interval (0,1). As such, the semi- 
algebraic set we seek to characterize is always a subset of 
the geometric intersection of the algebraic variety we find 
in the first step and the probability simplex on pi,..., p n . 
Note, however, that it is generically a strict subset of this 
intersection. 

These inequality constraints manifest themselves in dif¬ 
ferent ways. We present an example of one such mani¬ 
festation below and leave the remaining examples to ap¬ 
pendix [b] 

5 with respect to the lexicographic order gi > q 2 > • • • > 

qm > pi > ■ ■ ■ > Pn- 
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Fig. 3: (a) A = gv and B = pA. (b) pooPn > PoiPio- 

Consider the causal model of Fig. [3ja). Defining gi, g 2 
and g 3 to be the probabilities for p = 0, v = 0 and A = 0 
respectively, the joint distribution generated by this model 
is 

V(A,B) = (q 1 + 5i5253)[00] + 5 1 525 3 [01] 

+ 5i 5253[10] + 5i5 2 53[ n ]- (3.0.2) 

We begin by providing an intuitive account of the semi- 
algebraic set describing such joint distributions. Note first 
that F(A,B) can be rewritten as 

P(A B) = qi [ 00 ] + (5253[ 00 ] + 5253[0!] 

+ 5253 [10] +5 2 53[H])> 

implying that it is the convex combination, with weight gi, 
of the point distribution [00], and with weight q l9 of the 
distribution arising from the functional causal structure of 
Fig. He), shown above to be characterized by the equal¬ 
ity PooPn = PioPoi- It follows that the semi-algebraic 
set defined by ¥(A,B) contains all interior points on any 
line extending from the vertex [00] to a point on the fan 
depicted in Fig. [2jd); this variety is depicted in Fig.[3jb). 

Reading off the expressions for poo, Poi, Pio, and pn from 
Eq. ( |3.0.2| ), we obtain the set of polynomials that define the 
full algebraic variety. The ideal generated by these is: 

(poo -51- 5i5253,Poi - 5i525 3 >Pio “ 5i5 2 53, 

P 11 -5i5 2 5 3 )- 

To implement the first step of the general procedure out¬ 
lined above, we derive the Groebner basis for this ideal 0 

51 = 525i - 5i - 52 - Pio - Pu + 1 

52 = 535i - 5i - 53 - Poi - Pu + 1 

53 = 53P10 + 53P11 - Pio 

54 = 52 P 01 + 53Pn - Poi 

55 = Poo + Poi + Pio + Pll — 1 

5e = Pn + P 01 P 10 +P 11 P 10 - Pll +P 01 P 11 +Pn5i- 

Now g 5 = 0 is just the normalisation condition and ge = 0 
gives the following: 

Pu (pio + Poi +P 11 - 1) + PoiPio +P1151 = 0 

6 with respect to the lex order 51 > 52 > 5.3 > poo > Poi > 
p 10 > P11 











which, using the normalisation condition, results in 

PllPOO - PlOPOl n 

qi = -• (3.0.3) 

Pu 

To implement the second step of our procedure, we begin 
by enforcing q\ > 0. This results in the following inequal¬ 
ity 

P 11 P 00 >PioPoi- 

None of the remaining constraints 0 < < 1, for i G 

{2, 3} result in nontrivial relations among the Pifls, so the 
latter inequality is the only nontrivial constraint. Together 
with the open-interval constraints 0 < Poo,Poi,Pio,Pii < 
1 , it describes the necessary and sufficient conditions 
for the distribution on observed variables to be com¬ 
patible with the functional causal structure of Fig. [3ja). 
These conditions define the semi-algebraic set depicted in 
Fig. 0b). 

4 Characterizing the observational 
equivalence classes 

In this section, we will provide a scheme for inductively 
characterizing all observational equivalence classes. As 
noted in the introduction, we consider only causal models 
where there is a pair of binary observed variables, which 
we denote by A and B. 

4.1 Sufficiency of considering purely common-cause 
models 

A causal model having no directed causal influences 
between the observed variables will be termed purely 
common-cause. 

Lemma 4.1.1. Every causal model wherein there is a di¬ 
rected causal influence between A and B (either A B 
or B A) is observationally equivalent to one that is 
purely common-cause. 

The proof is as follows. Suppose that there is a directed 
causal influence B —>> A. If the collection of all latent 
variables is denoted by A, then a general causal model can 
be specified by the functional dependences B = /(A) and 
A = g{\,B) for some functions / and g. But this is ob¬ 
servationally equivalent to the causal model that is purely 
common-cause with functional dependences B = /(A) 
and A = g'( A) where g'( A) = g(X, /(A)). In character¬ 
izing the distinct observational equivalence classes, there¬ 
fore, it suffices for us to consider the models that are purely 
common-cause, and therefore we restrict our attention to 
these henceforth. 

An explicit example serves to illustrate this equivalence. 
The causal model depicted in Fig.|4ja), with functional de¬ 
pendences A = A ® B and B = v, involving a directed 
causal influence from B to A, is observationally equiva¬ 
lent to the causal model depicted in Fig. |4jb), with func¬ 
tional dependences A = A ® p and B = /x, which is 
purely common-cause. To see this, note that one can ex¬ 
press the functional dependences of the first causal model 



Fig. 4: (a) DAG for causal model defined by A = A 0 B 
and B = v (b) DAG for causal model defined by A = A®p 
and B = /i. 

as A = g( A, v, B) = A ® B and B = /(A, v) = v. Per¬ 
forming the substitution described in the previous para- 
graph yields A = g( A, v, /(A, v)) = g'( A, v) = A © v, 
which on identifying v with /i, results in the second causal 
model. 

4.2 Sufficiency of considering models with binary 
latents 

We call a causal model where all the latent variables are 
binary a causal model with binary latents. If there are 77 - 
binary latent variables, it is called an n-latent-bit causal 
model. 

Theorem 4.2.1. Consider the family of causal models 
where the latent variables are discrete and finite, but not 
necessarily binary. Every such model is observationally 
equivalent to one with binary latents. Equivalently, there 
is a causal model with binary latents in each observational 
equivalence class. 

The proof is provided in appendix [C| but we now present 
a simple example which illustrates the main idea of the 
proof. 

Consider the causal model of Fig. [5ja), where C, D are 
binary, but r is a three-valued variable, i.e., a trit. Suppose 
the functional relationships are as follows: C = r mod 2 
and D = (2(r ©3 1)) mod 2, where ®£ means addition 
modulo k. The values of C, D for different values of r are 
given in the table below. 


r 

c 

D 

0 

0 

0 

1 

1 

1 

2 

0 

1 


One can see that the distributions over C, D that can be 
generated by this model correspond to the face of the tetra¬ 
hedron that contains the vertices {[00], [11], [01]}. 

The trick to simulating this model using a 2 -latent-bit 
model is to replace the latent three-valued variable r with 
a pair of binary variables 7 and 77 and to imagine that these 
are causally related in the manner depicted in Fig. [ 5 jb). 
That is, we imagine a latent bit v acting locally on 7 and 
a latent bit fi acting as a common cause of 7 and 77 with 
the functional dependence 7 = pv and 7 = fi. This causal 
model can generate any distribution over 7 and g that has 
support only on the values (7, g) G {( 0 , 1 ), ( 0 , 0 ), ( 1 , 1 )}, 
as can be seen by consulting the row containing class 














Fig. 5: Example of how to reduce a causal model with a 
latent trit to one involving only latent bits, (a) The orig¬ 
inal causal model, with functional dependences C = r 
mod 2, and D = (2(r 03 1)) mod 2. (b) the trit r 
is replaced by two bits, 7 and 77 , which are presumed 
to be determined by a causal model having the depicted 
causal structure with functional dependences 7 = fiu, and 
77 = /i. (c) The causal model with binary latents that sim¬ 
ulates the original model; the functional dependences are 
C = u/i 02 v, and D — v. 


(2,1, b) id from the 3-page table appearing later in this pa¬ 
per, where A and B play the role of 7 and 7 . 

If we take 7 and 7 to be related to r by r = (7 mod 3) 03 
(77 mod 3), so that the values (0,0), (0,1) and (1,1) of 
(7, 77 ) map respectively to the values 0,1 and 2 of r, then 
any distribution over r can be emulated by some distribu¬ 
tion over the values ( 0 , 0 ), ( 0 , 1 ) and ( 1 , 1 ) of ( 7 , 77 ) and 
hence some distribution over p and v. Finally, we can ex¬ 
press C and D explicitly in terms of p and v by eliminating 
7 and 77 , obtaining the causal model depicted in Fig. [5jc) 
with dependences C = vfi 02 v and D = v. By construc¬ 
tion, we must obtain precisely the same semi-algebraic set 
for C and D in the model of Fig. [5jc) as one does in the 
model ofFig.^a). We have therefore defined a 2-latent-bit 
model that simulates our latent trit model. 


The key ingredient of the above example was that we 
were able find a causal model which could—by appropri¬ 
ately varying over the distribution of its latent variables— 
generate any distribution over a given face of the tetrahe¬ 
dron, and hence any distribution on a trit. In the case of an 
777 ,-valued latent variable however, one would need to find 
a /c-latent-bit model which could generate any distribution 
on an m- simplex. We provide an inductive procedure for 
constructing such a latent-bit model in appendix [C] 


Theorem 4.2. 1| implies that for the project of determining 
the observational equivalence classes, it suffices to con¬ 
sider models with binary latents. and so we restrict our 
attention to these henceforth. 


4.3 Inductive scheme 

Next, we define a scheme for composing pairs of 77 -latent- 
bit causal models into a single (n + 1 )-latent bit causal 
model, such that if we start with all possible pairs of 77- 
latent-bit causal models, and apply the composition oper¬ 


ation, we generate all possible (77 + 1)-latent-bit causal 
models. 

Denote the n latent binary variables by A = (Ai,..., A n ). 
A general 77-latent-bit causal model is then defined by the 
functional dependences 

A = J2 a a and B = ^b a \ a (4.3.1) 

ol a 

where A a is shorthand for the monomial A^ 1 ... X^ n for 
some set of exponents a = (07, ... a n ), and a a ,b a G 
Z 2 are parameters that specify the nature of the functional 
dependences. 

We assume that the first causal model is defined by pa¬ 
rameters {a^} and {b^}, and the second is defined by 
parameters {a^} and {6^}. The additional binary latent 
variable, which supplements the n binary variables of the 
original two models is denoted S. The (n + 1)-latent-bit 
model which is the composition of the two models is de¬ 
fined by the functional dependences 

^ = ^[(«5©l)ai°)+«5aW]A“, 

a 

B = y^P® 1 )foi 0) + Sb^]X a . (4.3.2) 


This construction has been chosen such that S acts as a 
switch variable: if we set S = 0 in the resulting (77 + 1 )- 
latent-bit model, we recover the first 77-latent-bit model, 
while if we set S = 1 , we recover the second 77-latent-bit 
model. 


With these definitions, our composition result can be sum¬ 
marized as follows. 


Theorem 4.3.1. Consider the map that takes a pair of 77- 
latent-bit caus al mod els defined by the functional depen¬ 
dences ofEq. {4.3.1) with parameters U {b^} for 

the first model, and parameters {a} U {b^ }for the sec¬ 
ond model, and returns the (77 + 1)-latent-bit causal model 
defined by the functional dependences of Eq. ( fQjj ). Un- 
der this map, the image of the set of all pairs of n-latent- 
bit causal models is the set of all (77 + 1)-latent-bit causal 
models. 


Proof. The functional dependences of Eq. ( |4.3.2| ) can 
equivalently be expressed as polynomials in A and S as 


A = ^(a£ 0 )A“ + (ai°)©aW)A^), 

a 

B = H + ( & « 0) ® & i 1) ) A “ <5 ) ( 4 - 3 - 3 ) 


It now suffices to note that as one varies over all possi¬ 
ble joint values for the variables in the set { a ^ } U } 

(there are 2 2 ™ +1 possibilities), one necessarily varies over 
all possible joint values for the variables in the set {ai°^} U 
0 cia^}, which in turn implies that one is varying 
over all possible polynomials in Ai,..., A n and S in the 
expresson for A. By a similar argument, as one varies 













over all possible joint values for the variables in the set 
{ba ^} U {ba^ }, one varies over all possible polynomials in 
Ai,..., A n and S in the expression for B. It follows that as 
one varies over all possible joint values for the variables in 
the set {ai°^} U {a^} U {b^} U {6^}, one obtains all 
possible manners in which A and B might be functionally 
dependent on the latent variables in the (n + 1 )-latent-bit 
causal model. Thus as one varies over all possible pairs of 
n- latent-bit causal models in our switch-variable construc¬ 
tion, one varies over all possible (n + 1 )-latent-bit causal 
models. □ 

We can therefore generate all causal models with binary 
latents by this inductive rule starting from the O-latent-bit 
causal models. 

4.4 Catalogue of observational equivalence classes 

Recall that two causal models are observationally equiv¬ 
alent if they define the same semi-algebraic set. Thus, 
to characterize the observational equivalence classes, we 
proceed as follows. For each new causal model that we 
generate by the inductive scheme, we determine the cor¬ 
responding semi-algebraic set. Every time one obtains a 
variety that has not appeared previously, one adds it to the 
catalogue of observational equivalence classes. 

Note that if a causal model has been obtained from two 
simpler models via our composition scheme, then the 
semi-algebraic set associated to it necessarily includes as 
subsets both of the semi-algebraic sets of the simpler mod¬ 
els (note that this semi-algebraic set is generally not the 
convex hull of the semi-algebraic sets of the two simpler 
models). It follows that if the semi-algebraic set of a given 
causal model is found to be the entire tetrahedron, then 
composing this model with any other will also yield the 
tetrahedron. In this case, there are no new observational 
equivalence classes to be found among the descendants of 
this causal model in the inductive scheme. 

In particular, if it were to occur that at some level of the in¬ 
ductive scheme, every newly generated causal model could 
be shown either to reduce to a previously generated causal 
model or to yield a semi-algebraic set that is the entire 
tetrahedron, then one could conclude that one’s catalogue 
of the observational equivalence classes of causal mod¬ 
els was complete in the sense that any n -latent bit causal 
model belongs to one of these classes. 

We have used our inductive scheme to construct all ob¬ 
servational equivalence classes generated by causal mod¬ 
els with four or fewer binary latent variables. We have 
also considered a large number of causal models with five 
binary latent variables and found no new observational 
equivalence classes. This suggests that our catalogue may 
already be complete, although we do not have a proof of 
this. Above, we noted circumstances in which our induc¬ 
tive scheme would terminate, which provides one strategy 
for attempting to settle the question. Even in the absence 
of a proof of completeness, the inductive scheme presented 
here for classifying observational equivalence classes may 
be of independent interest to researchers in the field. 


The observational equivalence classes of causal models 
that we have obtained (which cover all causal models with 
four or fewer binary latent variables) are presented in the 
table covering the next three pages. For each class, we 
depict the semi-algebraic set that defines the class, the fea¬ 
sibility test for the class, and a representative causal model 
from the class. Note that the open-interval constraints 
0 < PooiPohPiOiPn < 1 are part of every feasibility 
test unless explicitly stated otherwise. The corresponding 
constraint on the affine varieties is that those varieties con¬ 
fined to the edges exclude the vertices, those confined to 
the faces exclude the edges, and those in the bulk exclude 
the faces. 

The task of describing the catalogue is simplified by the 
fact that many of the observational equivalence classes are 
related to one another by simple symmetries. We therefore 
organize the classes into orbits, where an orbit is a set of 
classes whose elements are related to one another by a set 
of symmetry transformations. For one of the classes in the 
orbit (which we term the ‘fiducial’ class), we provide a full 
description, and below this description, we specify the set 
of symmetry transformations that must be applied to it to 
obtain the other elements of the orbit. Formally, this is a 
set of representatives of the right cosets of the subgroup of 
symmetries of the semi-algebraic set in the full symmetry 
group of the tetrahedron. 

We express these representatives as compositions of the 
following set of symmetry transformations, which we de¬ 
fine below: {Id, /a, /s, S, X}. For each of the five, we 
specify both their action on the causal model, i.e., their ac¬ 
tion on the functional dependences, from which their ac¬ 
tion on the DAG can be inferred, and on the elements of 
the joint distribution {p ab : a, b G Z 2 }, from which their 
action on the feasibility test can be inferred. Each sym¬ 
metry transformation also defines an action on the tetra¬ 
hedron in an obvious manner. Id is the identity transfor¬ 
mation, leaving the model and p ab invariant; /a is the bit 
flip on A , replacing the functional dependence A = f( A) 
with A = /(A) © 1 and mapping p ab -> p a @i,b\ Ib is 
the bit flip on B , defined analogously to /a; S is the swap 
transformation, replacing the functional dependences A = 
/(A), B = g( A) with A = g( A), B = /(A), and mapping 
Pab Pba j X is the “add B to A” transformation, replac¬ 
ing the functional dependences A = /(A), B = g( A) with 
A = /(A) © g( A), B = g( A) and mapping p ab -> p a @b,b- 
We denote a composition of two symmetry transforma¬ 
tions by a right-to-left product: for instance, a bit flip on A 
followed by a swap is denoted S/a- The conjunction of a 
bit flip on A and a bit flip on B yields the same transforma¬ 
tion regardless of the order in which they are implemented 
and is denoted /ab • 

Finally, a given observational equivalence class will be dis¬ 
tinguished by a label of the form (n, ra, x) g . Here, n is the 
number of binary latent variables in the causal model, m 
is the number of these that act as common causes, x is 
an optional label that is used for distinguishing functional 
dependences that are consistent with a given (n, m) but 
are observationally inequivalent, and g labels the symme¬ 
try transformation that relates the class to the fiducial class 





















































Class 

Semi-algebraic set 

Test for feasibility 

Minimal causal model 
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v p A 

(3,1, a)i d 

[°°] 

\ [10] 

PooPi 1 > PoiPio 

A B 

A — / / 7 / R — / / \ 
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.ri — — /Jj/\ 


G(3,1, a) = {Id, f A } 



<2(3,1, b) = {Id, f AB } 
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[ 00 ] 
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4 2 p w +pu-l ’ 
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v p A 
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A = p®v, B = p(B A 



G(3,l,c) = {Id} 
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A fi 

A B 

(3, 2, a)id 
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A = fiv ® /iA ® 1, B = /iz/ ® 1 


G(3,2,a) = {Jd, g/ A , f a , /.4X5} 
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A zi 
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G(3,2,6) = {Id, /abXS, f bXS , /a} 


(3,2,c)id 

[( 

>1] 

|4(pio-Pn)(PooPio PoiPn)| < 
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fi s 
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0] 


A = /iz/, T? = /i ® z/ ® 5 


G(3,2,c) = {Id, SJaJabSX , f B XS, XS, f A XS, f A X, f AB X, f A SXS, X} 


























































Class 

Semi-algebraic set 

Test for feasibility 

Minimal causal model 
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for the particular n, m and x. The set of possibili¬ 
ties for g for a given n,ra, and x is a subgroup of the 
group closure of {/d, /a, /#, 5, X}, which we denote by 
G(n, ra,x). Note that n, m G N, m < n, and we take 
xG {a, 6, c,... }. 

The first few steps of our iterative procedure for the con¬ 
struction of causal models proceed as follows. 

The semi-algebraic sets associated to the four O-latent-bit 
causal models are the four vertices of the tetrahedron, la¬ 
belled by the deterministic assignments to A and B , that is, 
as [00], [10], [01] and [11]. These correspond to the classes 
{(0,0) s : g G {Id //aJbJab}}, depicted in the first 
row of the table (because there is only one observational 
equivalence class with n = 0 and m = 0 , the label x is not 
necessary in this case and so it is not excluded from the 
name of the class). 

One finds that by composing these with one another into 
1 -latent-bit causal models, one arrives at six new obser¬ 
vational equivalence classes. Four of these correspond 
to models with a single latent bit that acts locally, and 
their semi-algebraic sets are the four edges of the tetrahe¬ 
dron with endpoints {[ 00 ], [ 01 ]}, {[ 10 ], [ 11 ]}, {[ 00 ], [ 10 ]}, 
{[ 01 ], [ 11 ]}, which we might call the AB-uncorrelated 
edges ; these correspond to the classes {(1,0)^ : g G 
{Id, /a, S', fs £}}, depicted in the second row of the table. 
Two of these correspond to models with a single latent bit 
acting as a common cause and their semi-algebraic sets are 
the [ 00 ]-[ll] and [ 01 ]-[ 10 ] edges of the tetrahedron, which 
we might call the AB-correlated edges , corresponding to 
the classes {(1,l) p : g G {Id, /a}}, depicted in the third 
row of the table. 

Next, one constructs all of the 2-latent-bit causal models 
and finds their semi-algebraic sets. This set includes the 
model of Fig. 2(c), where both latent bits act locally, and 
whose semi-algebraic set is the fan of Fig. 2(d), which 
touches each of the AB -uncorrelated edges of the tetra¬ 
hedron, corresponding to the single class (2, 0)id, depicted 
in the fourth row of the table. This set also includes the 
models of Fig. 1(a) and Fig. 2(a) where one of the latent 
bits acts as a common cause and whose semi-algebraic sets 
are the fans of Fig. 1(b) and Fig. 2(b), which touch the 
AT>-correlated edges of the tetrahedron. They correspond 
to the pair of classes {(2,1, a) g : g G {Id, S}}, depicted 
in the fifth row of the table. There is also a second type 
of 2 -latent-bit causal model where one latent bit acts as a 
common cause which yield the semi-algebraic sets corre¬ 
sponding to the four faces of the tetrahedron. These are 
the four classes {(2,1, b) g : g G {Id, /a, /#, S}} in the 
table. When both of the latent variables act as common 
causes, one obtains semi-algebraic sets that are subsets of 
a face of the tetrahedron and which have the appearance 
of the StarFleet insignia from Star Trek, of which there 
are twelve in total. These are the classes labelled (2, 2) g 
in the table. The construction of 3-latent-bit and 4-latent- 
bit causal models proceeds similarly and the new observa¬ 
tional equivalence classes one thereby obtains are depicted 
in the rest of the table. 


5 Identification of parameters in the causal 
model 


Our results also have applications for the identification 
problem, that is, the problem of determining which param¬ 
eters in a causal model can be identified or bounded using 
observational data. 

Consider the problem of identifying the probability distri¬ 
butions over the latent variables (our qj parameters) in a 
causal model associated to a given functional causal struc¬ 
ture. From the description of our algorithm, it is clear 
that the qj parameters are generally identifiable because 
the Groebner basis provides a means of computing expres¬ 
sions for them in terms of our pi parameters (the observa¬ 
tional data). Indeed, we often must compute the explicit 
expressions for one or more of the qj s in terms of the piS 
as an intermediate step on the way to deriving our feasibil¬ 
ity tests. Eq. ( |3.0.3| ) is an example of such an identification 
formula. 

The other sort of parameter of a causal model that one may 
wish to identify is the nature of the functional dependences 
(assuming the model is indeed functional). For the sorts 
of models we consider, this problem is also solved by our 
results. 

Consider the problem where the causal structure is given, 
but where there is uncertainty over the nature of the func¬ 
tional dependences thereon. For instance, suppose that it 
is known that the functional causal structure is either the 
minimal structure associated to the class (2, 1 , a)id in our 
table or the one associated to (2,1, b) id- Because the semi- 
algebraic sets defining these two classes do not intersect 
it is clear that one can settle this decision problem on the 
basis of the observational data. 

As another, more nuanced example, suppose that it is 
known that the functional causal structure is the minimal 
structure associated to one of the three classes (3,1, a)id, 
(3, l, 6 )id, and (3, l,c)id in our table. Here, one finds 
that certain points in the space of distributions over the 
observed variables pass the feasibility test for just one of 
these functional causal structures, other points pass the test 
for two of them, while still others pass the test for all three. 

More generally, one might know only the causal structure. 
For instance, the set of possible functional causal struc¬ 
tures might be the minimal ones in each of the classes in 
the set {(2,1 ,a) g : g G {Id, S}} U {(2, l,b) g : g G 
{Id, fA, f ]3 , S}}. The feasibility tests we have derived 
provide a means of determining, for any given point in 
the space of distributions over the observed variables, pre¬ 
cisely which of these functional causal structures is com¬ 
patible with that observational data. 


7 Recall our convention of demanding the probabilities for la¬ 
tent variable to be bounded away from 0 and 1, so that all of our 
semi-algebraic sets are confined to the interior of the tetrahedron. 





6 Discussion 

6.1 Future directions 

The restriction to pairs of binary observed variables is a 
limitation of our analysis. In future work, we hope to ex¬ 
tend our approach to cases where the observed variables 
have an arbitrary number of values and where the num¬ 
ber of observed variables is also arbitrary. While the tools 
from algebraic geometry employed in this paper provide 
a procedure for deriving feasibility tests for such func¬ 
tional causal structures in principle, in practice it is un¬ 
likely that such procedures will be scalable. Indeed, calcu¬ 
lating Groebner bases is an EXPSPACE-complete prob¬ 
lem f9j. Nevertheless, it may still be possible to develop 
new tools for causal inference in these cases using the ap¬ 
proach described here. 

It also remains an open problem to decide, for any given 
functional causal structure, which observational equiva¬ 
lence class it belongs to. That is, even if our catalogue of 
classes is complete, it merely establishes that every func¬ 
tional causal structure falls into one of these classes, but 
it does not provide a means of deciding, for a given func¬ 
tional causal structure, having an arbitrary number of la¬ 
tent variables and functional dependences, which class it 
is a member of. Of course, if one supplements a given 
functional causal structure with distributions over the la¬ 
tent variables, then one obtains a joint distribution over the 
observed variables and this can be subjected to the feasibil¬ 
ity tests for different observational equivalence classes. It 
is likely, however, that there are better ways of solving the 
classification problem, for instance, by determining how 
the functional dependences can be simplified. Solving this 
classification problem would allow one to find common 
features of all of the functional causal structures in a given 
class, for instance, features of the topology of the causal 
structure. 

We have here made the idealization that the uncontrolled 
statistical data is given as a joint distribution whereas in 
practice it is a finite sample from this distribution. To con¬ 
tend with this idealization, one should in practice evaluate 
causal models by considering how well the finite statistical 
data can be fit to them. 

6.2 Relevance to quantum foundations 

One of the motivations of the current work was the 
prospect of new insights into the interplay between causal 
structure and observed correlations in quantum theory. In 
particular, for a pair of quantum systems—each subjected 
to one of a set of possible measurements—a Bell inequal¬ 
ity [Him 12] is a constraint on the joint probability dis¬ 
tribution over the outcomes of each possible choice of the 
local measurements (that is, for every combination of local 
measurement settings). It has recently been noted [ 13 , 14 ] 
that one can understand the assumptions required to derive 
a Bell inequality as the standard assumptions for causal 
inference together with a particular hypothesis about the 
underlying causal structure, namely, that each local out¬ 
come depends causally on the corresponding local setting 


and on a latent common cause between the two systems. 
This causal structure is illustrated in Fig|6j where A and 
B are the measurement outcomes for each quantum sys¬ 
tem, X and Y are the local choices of measurement, and 
H is the latent common cause. The complete set of Bell 
inequalities for this scenario, therefore, can be understood 
as a feasibility test for such a causal model. 

The problem considered in the current work is different 
from that of deriving the complete set of Bell inequalities 
in a couple of ways: (i) The observational input to our 
causal inference problem is different; there are no setting 
variables in our problem—that is to say, any variable dis¬ 
tinct from the observed A , B appearing in a causal struc¬ 
ture must be latent—and therefore our input is a single 
joint distribution over two observed binary variables rather 
than a set of such distributions, one for each choice of the 
setting variables, (ii) The hypotheses whose feasibility we 
are testing are different; while the set of all Bell inequali¬ 
ties provides a test of the feasibility of the causal structure 
illustrated in Fig. [6] we here seek to assess the feasibility of 
a causal structure for a given choice of cardinalities for the 
latent variables appearing therein (e.g., whether a given la¬ 
tent variable consists of a single bit, two bits, etcetera) and 
for a given choice of the precise form of the functional de¬ 
pendence of the observed variables on the latent variables. 

Consider the Bell scenario of Fig. [6] where A , B , X, and Y 
are binary variables. To define a functional causal struc¬ 
ture, one must supplement this causal structure with a hy¬ 
pothesis about the cardinality of fi and a hypothesis about 
the function / that maps X, fi to A and the function g that 
maps Y,n to B. (Given that there are only 16 possible 
values of A , T?,X, and Y, a (i of cardinality 16 is suffi¬ 
cient to simulate any other case.) The conditional distribu¬ 
tions P(A, B\X,Y) compatible with this functional causal 
structure are 

P(A B\X, Y) = J2 P(A B\X, Y, 


where q /L denotes a probability distribution over the la¬ 
tent variable fi. To determine the semi-algebraic set of 
possibilities for F(A,B\X,Y) that are compatible with 
this functional causal structure, one could use the tech¬ 
niques of the present article. From the polynomial equali¬ 
ties that hold between the P (A, B |X, Y) and the q M (those 
given in Eq. ( |6.2.1| )), one seeks to obtain constraints on the 
P(A, B\X, Y) alone by eliminating the q Because the 
variables to be eliminated appear linearly, to eliminate the 
it suffices to use quantifier elimination techniques that 
are less computationally demanding than implicitization, 
such as Fourier-Motzkin elimination. 

Note that if some observed correlations violate an inequal¬ 
ity derived in this fashion, it only establishes the infeasi¬ 
bility of a given classical functional causal structure. Vio¬ 
lation of Bell inequalities, on the other hand, rule out the 
feasibility of the causal structure, regardless of the cardi¬ 
nality of the latent variables and the nature of the func¬ 
tional dependences. In this sense, deriving Bell inequali- 
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X n Y 

Fig. 6: In the Bell scenario, one is interested in the con¬ 
ditional distribution P(A, B\X,Y). This is equivalent to 
a set of distributions over A, B , {P(x,y)(A B)}, one for 
each choice of measurement setting. 


ties is more challenging than deriving feasibility tests for 
functional causal structures. However, in another sense, 
deriving Bell inequalities is more straightforward because 
the semi-algebraic set defined by Eq. \6.2A\ is a poly¬ 
tope, whereas for a general funcational causal structure 
this is not the case. The mathematical tools that have 
been used to derive Bell-type inequalities—which include 
semi-definite m and linear programming 02) as well as 
Fourier-Motzkin elimination [17 , 18 ]—are therefore quite 
different from those used here. 

Bell inequalities are significant to the foundations of quan¬ 
tum theory because they are found to be violated in exper¬ 
iments on pairs of separated quantum systems, implying 
that the predictions of quantum theory cannot be explained 
by a classical causal model with the causal structure that 
one expects to hold for the experiment (that of Fig. [6]) with¬ 
out fine-tuning tm 

Researchers in the field of quantum foundations have 
now begun to apply insights obtained from the study of 
Bell inequalities to the problem of deriving constraints 
on observed correlations in more general causal scenar¬ 
ios and the current work consti¬ 

tutes another contribution in this direction. 

More importantly, there are now a few proposals for how 
to generalize the standard notion of a causal model to the 
quantum realm. Ref. (24), for instance, proposes a defi¬ 
nition of a quantum causal model in terms of a noncom- 
mutative generalization of conditional probability, while 
Refs. fl9l 1251126) follow a more operational approach. 
With a notion of quantum causal model in hand, one can 
explore the problem of inferring facts about the quantum 
causal model from observed correlations. This is the prob¬ 
lem of quantum causal inference. 

In the case of Bell-type experiments, for instance, one ex¬ 
pects a quantum causal model with the natural causal struc¬ 
ture (that of Fig. [6]) to be feasible only if the observed cor¬ 
relations satisfy the so-called Cirel’son bound, which is 
a generalization of a Bell inequality Ez). A simple case 
of quantum causal inference that has been investigated re¬ 
cently is the problem of distinguishing a cause-effect re¬ 
lation from a common-cause relation. Here, it has been 
shown that the quantum correlations can distinguish the 
two cases even in uncontrolled experiments, implying a 
quantum advantage for causal inference [ 2J]]. 

In quantum causal models, variables are replaced by sys¬ 


tems, each represented by a Hilbert space, and one makes a 
distinction between observed systems, upon which a mea¬ 
surement is made, and latent systems. Sets of systems are 
described by joint quantum states (as opposed to the joint 
probability distributions that describe sets of variables), 
and the functional dependences are specified by unitary 
maps. A natural analogue of the classical causal inference 
problem is to make inferences about the causal structure 
and the parameters of the causal model given a joint quan¬ 
tum state on the observed systems. The natural analogue 
of the functional causal structures considered in this arti¬ 
cle are quantum causal structures together with a specifi¬ 
cation of the dimensions of the latent systems and the uni- 
taries that describe the functional dependences. To derive 
a feasibility test for a functional causal structure, one must 
eliminate the real-valued parameters that specify the quan¬ 
tum state of the latent systems. For example, if a given 
latent system is 2-dimensional (the quantum analogue of a 
binary latent variable), there are three real-valued param¬ 
eters needed to specify the state completely (as opposed 
to the one real parameter needed to completely specify a 
distribution over a classical bit). The expectation values 
of the three Pauli operators, for instance, suffice to do so. 
Nonetheless, one can still take advantage of the techniques 
from algebraic geometry employed in this work to elim¬ 
inate these parameters and determine constraints on the 
quantum state of the observed systems. In this way, we 
ought to be able to derive feasability tests for functional 
causal structures in the quantum sphere. 

6.3 Related work 

The extent to which the mathematical tools associated to 
quantifier elimination are well-suited to problems of causal 
inference has been previously emphasized by Geiger and 
Meek [29 ]. Many authors have noted, in particular, the ap¬ 
plicability of quantifier elimination to the problem of de¬ 
riving tests for the feasibility of a causal structure when 
the cardinality of the latent variables is known. Ref. 1291 , 
for instance, used Cylindrical Algebraic Decomposition to 
derive equality and inequality constraints for a particular 
causal model. However the computational complexity of 
such brute-force quantifier elimination (doubly exponen¬ 
tial in the number of parameters) means that its applica¬ 
tions are limited to very simple examples. 

Many previous works have appealed to implicitization 
procedures using Groebner bases to obtain equality con¬ 
straints for causal models. Geiger and Meek l30lL Gar¬ 
cia, Stillman and Sturmfels ED, and Garcia |[32l have 
used implicitization to obtain the smallest algebraic vari¬ 
ety that contains the semi-algebraic set of joint distribu¬ 
tions over observed variables for various causal structures 
with known cardinalities of latent variables. This yields 
polynomial equalities on the joint distribution whose sat¬ 
isfaction are necessary conditions for compatibility with 
the causal structure. Kang and Tian ll33l have also ap¬ 
plied implicitization techniques to the problem of identify¬ 
ing polynomial equality constraints on observational and 
interventional distributions (using the framework supplied 
bv Refs. 0411351). 




Our work goes beyond these treatments insofar as it uses 
implicitization as one step in an algorithm that finds the 
semi-algebraic set itself rather than the smallest algebraic 
variety containing it. The second step is to use the exten¬ 
sion theorem (described in Appendix A) to find inequal¬ 
ity constraints on the joint probability distribution over ob¬ 
served variables from knowledge of the Groebner basis. To 
illustrate the difference, consider the observational equiva¬ 
lence class labelled (2, 2)id in our classification. This cor¬ 
responds to a semi-algebraic set for which the smallest al¬ 
gebraic variety containing it is the plane pio = 0. The 
intersection of this variety with the tetrahedral probabil¬ 
ity simplex is its pio = 0 facet. The semi-algebraic set, 
however, is a strict subset of this, the one satisfying the 
additional inequality (poi + 2 pn — 2 ) 2 > 4poo- 

One novel feature of our approach which distinguishes it 
from previous uses of implicitization is that we focus on 
deriving feasibility constraints for a causal structure with 
specific functional dependences. In previous approaches, 
the set of variables that needed to be eliminated included 
both the parameters describing the probability distribu¬ 
tions for the root variables and the parameters describing 
the conditional probability distributions for each non-root 
variable. In our approach, the second sort of parameter is 
fixed and not in need of elimination. The restriction to bi¬ 
nary variables ensures that the number of distinct possible 
functional dependences is relatively modest. 

Finally, the use of Groebner bases in identifying or bound¬ 
ing parameters in a causal model has also been highlighted 
in previous work such as Garcia-Puente et al\ 36). 

After the completion of this work, we became aware of 
related independent works by Chaves ED and Rossett et 
al. f38l which also derive nonlinear inequalities for deter¬ 
mining the feasibility of certain causal structures. These 
authors consider structures which, like Bell scenarios, have 
multiple pairs of observed variables that are related as 
cause and effect (understood as setting-outcome pairs) but 
which, unlike Bell scenarios, can have more than one latent 
common cause acting on the outcome variables. Chaves 
simplifies the quantifier elimination problem that must 
be solved using a round of Fourier-Motzkin elimination, 
while Rossett et al. provide an inductive approach for 
deriving new inequalities from given inequalities for sub¬ 
graphs of the causal network under consideration. Com¬ 
bining our approach with these other methods constitutes 
an interesting direction for future work. 
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Appendices 

A Ideals, varieties and Groebner bases 


We now introduce useful concepts and tools from algebraic geometry that we will make use of in solving the problem 
mentioned in section 2 of the main text. We will follow the presentation given in 0. 

We define a monomial in x \,..., x n to be a product of the form x X 2 • • • x n , where the exponents are non-negative 
integers, cti G Z>o for i = 1,..., n. We can simplify our notation slightly by letting a = (ai,..., a n ) and setting 


We can now define a polynomial over a field k. 

Definition A.0.1. A polynomial f in x\,... ,x n with coefficients in a field k is a finite linear combination of monomials. 
We write f as 

f = ^2 c aX a , C a £k, 
a 

where the sum is taken over a finite number of a’s. 

The set of all polynomials in aq,..., x n with coefficients in k is denoted k[x i,..., x n \. When we deal with polynomials 
we are mainly interested in the solution set of systems of polynomial equations. This leads us to the main geometrical 
objects studied in algebraic geometry, algebraic varieties and semi-algebraic sets, which we now define. 

Definition A.0.2. Let k be a field and let f\ 5 ..., f s be polynomials in k[x 1 ,..., x n \. Then we set 
V(/i, ...J s ) = {(ai,.. •, a n ) G k n : ffia ly ... ,a n ) = 0,V, 1 < i < s}. 

We call V(/i,..., f s ) the algebraic variety (also called the affine variety) defined by /i,..., f s . 

Thus, an algebraic variety V(/i,..., f s ) C k n is the solution set of the system of polynomial equations /i (x \,..., x n ) = 
• • • = f s (x i,..., x n ) = 0. A basic semi-algebraic set is defined to be the solution set of a system of polynomial equalities 
and /^equalities, that is: 

Definition A.0.3. A basic semi-algebraic set is defined by {x G M n : gfix) ^ 0, V i = 1,..., m}, where gi,... ,g n G 
M[#i,..., x n \ are polynomials over the real^and where ^ corresponds to either >, =, or <. 

Note that algebraic varieties are examples of basic semi-algebraic sets. 

Definition A.0.4. A semi-algebraic set is formed by taking finite combinations of unions, intersections, or complements 
of basic semi-algebraic sets. 

For any causal model, the set of possible joint distributions that can be generated by it are represented by a semi-algebraic 
set. It follows that two causal models are observationally equivalent if and only if they generate the same semi-algebraic 
set. 

We now introduce and define ideals , the main algebraic object studied in algebraic geometry. 

Definition A.0.5. A subset I C k[xi ,,.., x n \ is an ideal if it satisfies: 

1. 0 el, 

2- If f,g e /, then f + g G I, 

3. If f G I and h G k[xi ,..., x n \, then hf G I. 

A natural example of an ideal is the ideal generated by a finite number of polynomials. 

Definition A.0.6. Let /i,..., f s be polynomials in k[x i,..., x n \. Then we set 

s 

(fi, •••,/«) = :hi,...,h s € k[x 1 , 

i= 1 

It is not hard to show that (/i,..., f s ) is an ideal. We call it the ideal generated by /i,..., f s and we call /i,..., f s the 
basis of the ideal. 

8 Note that one can replace the real field R used in the last definition with any ordered field. 



Studying the relations between certain ideals and varieties forms one of the main areas of study in algebraic geometry. 
One can even define the algebraic variety V(/) defined by the ideal I C k[xi,, x n \, where 

V(J) = {(&!, • • •, a n ) G k n : f(a u ..., a n ) = 0, V/ G /}. 

The proof that V(J) forms an algebraic variety can be found in (7). Interestingly, it can also be shown that if / = 
(/i,..., f s ), then V(J) = V(/i,..., f s ). That is to say that varieties are determined by ideals. This will have interesting 
consequences for us, as we will see shortly. 

To find a general solution to the implicitization problem introduced in the main text we need to introduce monomial 
orderings and Groebner bases. 

First, note that we can reconstruct the monomial x " 1 ... x^ n from the n-tuple of exponents (ai,..., a n ) G Z> 0 . This 
establishes a one-to-one correspondence between Z > 0 and monomials in k[xi ,..., x n \. It follows that any ordering > 
on the space Z > 0 will induce an ordering on monomials: if a > /3 according to this ordering, then we will also say that 

x a > x&. 

Now, we want the induced ordering to be ‘compatible’ with the algebraic structure of the polynomial ring that our mono¬ 
mials live in. This requirement leads us to the following definition. 

Definition A.0.7. A monomial ordering on k[xi,...,x n ] is any relation > on Z > 0 satisfying: 

1. > is a total ordering on Z> 0 . That is to say that, for every a, (3 £ Z > 0 either a > f3, [3 > a or a = f3. 

2. If a > (3 and 7 G Z> 0 , then a + 7 > (3 + 7 . 

3. > is a well ordering on Z> 0 . This means that every non-empty subset ofTFf^ has a smallest element under >. 

The main monomial ordering we will make use of here is the lexicographic order , which we define as follows. 

Definition A.0.8 (Lexicographic order). Let a = (ai,..., a n ) and (3 = (/?i,..., /3 n ) G Z> 0 . We say a >i ex (3 if in the 
vector difference a — (3 G Z n , the leftmost non-zero entry is positive. We will write x a >i ex x& if a >i ex /3. 

Once we fix a monomial order, each / G k[xi ,..., x n ] has a unique leading term LT(/) relative to this order. We denote 
by LT(7) the set of leading terms of elements of the ideal /. We can then define (LT(/)) to be the ideal generated by the 
elements of LT(7). Consider a finitely generated ideal I = (/ 1 ,..., f s ), it is interesting to note that (LT(/*),..., LT (f s )) 
and (LT (/)) may in general be different ideals. But surprisingly there always exists Q a choice of basis gi,...,g t G I 
such that (LT (gi ),..., LT (g t )) = (LT (/)). These bases are know as Groebner bases. 

Definition A.0.9. Fix a monomial ordering. A finite subset G = {#i,..., gt} of an ideal I is said to be a Groebner basis 

if 

(LT(^),...,LT(^)) = (LT(/)). 

More informally, a set G = {< 71 ,..., g t } C I is a Groebner basis for I if and only if the leading term of any element 
of I is divisible by (at least) one of the LT(^). Groebner bases simplify performing many calculations in algebraic 
geometry and they have many interesting properties, some of which we will see shortly. There are efficient algorithms for 
calculating Groebner bases and many software packages that one can use to implement them. An example of a Groebner 
basis was given in the main text. Our use of the Groebner basis in that example was a special case of a general result, 
known as the elimination theorem , which provides us with a way of using Groebner bases to systematically eliminate 
certain variables from a system of polynomial equations and thereby solve the implicitization problem. We will state the 
elimination theorem shortly. First, we require the following definition. 

Definition A.0.10. Given I = (g 1 ,...,g t ) C k[x 1 ,..., x n \, the I th elimination ideal f is the ideal of k[x 1 ,..., x n ] 
defined by 

Ii= In k[x {+ 1 ,.. .,x n \. 

Thus Ii consists of all consequences of g\ — • • • = g t = 0 which eliminate the variables x \Using this language, 
we see that eliminating 27 ,..., x\ means finding non-zero polynomials in the I th elimination ideal of k[xi+ 1 ,..., x n \. 
With the proper ordering, Groebner bases allow us to do this instantly. We can now state the elimination theorem (for a 
proof, see Q). 

Theorem A.0.11 (Elimination theorem). Let I C k[x 1 , • • •, x n \ be an ideal and let G be a Groebner basis for I with 
respect to the lex order where x\ > X 2 > • • • > x n . Then, for every 0 < l < n, the set 

G t = Gnk[x i,...,x n ] 


is a Groeber basis of the I th elimination ideal. 


[ 01 ] 



Fig. 7: (a) A = gu and 5 = /i©i/® gu. 


(b) (poi + 2poo) 2 > 4poo- 


So, in our example with the fan depicted in Fig. 1(b)—discussed in the main text —#3 and g 4 form a Groebner basis of the 
2 nd elimination ideal and this is what allowed us to eliminate the variables q\ and q 2 . 

How do we know that we can extend solutions from the I th elimination ideal to the (/ — 1 ) th 2 More concretely, in our 
specific example of the fan, how do we know that the equation P 00 P 01 = #io#n defines the entire algebraic variety and 
not just some part of it? The following result shows us the conditions under which we can extent partial solutions to full 
ones. 

Theorem A.0.12 (Extension theorem). Let I C C[xi,..., x n \ and let I\ be the first elimination ideal of I. For each 
1 <i < s, write fi in the form 

fi = 9 i{ x 2 5 • • •, x n )xj Vi + terms of lower degree , 

where Ni > 0 and gi G C[xi,... ,x n \ is non-zero. Suppose we had a partial solution (< 22 ,..., a n ) G V(/i). If 
(< 22 ,.. •, a n ) V(< 7 i,..., g s ), then there exists a\ G C such that (ai,..., a n ) G V(J). 

When we work over (0,1) C M we also, in conjunction with the conditions of the above theorem, need to ensure that at 
every extension step the new solution is real and lies in ( 0 , 1 ). 

We can apply the above theorem to our example to see that, indeed, the equation P 00 P 01 = #io#n defines the smallest 
algebraic variety that contains the semi-algebraic set depicted in Fig. 1(b) in the main text. 

B More examples of deriving tests for feasibility 

Consider the functional causal structure of Fig.[7ja). The joint distributions that can arise from it are of the form 

P(A, B ) = ^ 1^2 [ 00 ] + (qiq 2 + # 1 ^ 2 ) [ 01 ] + 

The semi-algebraic set defined by ¥(A, B ) is shown in Fig. |7jb). We refer to this variety as a StarFleet insignia. The 
Groebner basis for the ideal 

(poo - qiQ 2 ,Poi - qiq 2 ~ 5 i? 2 ,Pn ~ Qite )> 
with respect to the lex order qi > q 2 > Poo > Poi > P 11 , is 

91 = q\ + 32 + Poo + 2p 0 i - 2 

92 =Poo +P 01 H- Pu — 1 

9o = q 2 + 2png 2 + Poiq 2 ~ 2g 2 — Pu ~ Poi + 1. 

The equation ^ 2=0 defines an equality constraint that restricts the joint probability distribution to the plane pio = 0 and 
therefore to the face of the tetrahedron containing the vertices [00], [01] and [11]. In order to extend the partial solution 
{PooiPohPn} to a full solution {qi,q 2 ,Poo,Poi,Pii} using the extension theorem, we must ensure that all the solutions 
are real. Now the equation #3 = 0 allows us to write q 2 in terms of the pif s as follows 

(poi + 2 Pn - 2 ) ± sjtp 01 + 2 Pu - 2) 2 + 4(pu + p ~ 0 1 - 1 ) 
q 2 = 2 ' 

So in order to ensure that q 2 G M, we must set (poi + 2pn — 2 ) 2 + 4(pn + poi — 1) > 0. Using the normalisation 
condition and rearranging gives us 

(#01 + 2 poo ) 2 > 4poo, 

which defines the semi-algebraic set depicted in Fig.JTJb). None of the remaining constraints 0 < Qi < 1, for i »= 1, 2, 3 
result in non-trivial relations among the Pifs. 





[ 01 ] 



Fig. 8: (a) A = pv and B = p®v @ 5. (b) \4(p 10 -pn)(p 0 oPio - PoiPn)\ < (pn(2p 0 i +2pio +Poo) -Pio(2p 0 o + 
2pn + Poi)) 


Consider the functional causal structure of Fig.[ 8 ja). The joint distributions that can arise from it are of the form 

P(A, B ) = (919293 + 4i9293 + <7i<M3) [00] 

+(4i4293 + 4i4243 + 4i4243)[01] + 4i4243[10] + 9i4243[H]- 

The semi-algebraic set defined by P(.4, B ) is shown from different angles in Fig. [ 8 ]+). We note that conditioning on 
the variable S being equal to 0 or 1 reduces this variety to one of the star trek symbols depicted on the faces. Similarly 
conditioning on v = 1 (or p = 1 ) reduces this variety to a fan. 

The Groebner basis for the ideal 

{Poo — 9 i 4243 — 4i4293 — 9i9293-Poi — 919293 — Q1Q2Q3 ~ 9i9293-P10 — 9i9293-P11 — Q1Q2Q3), 
with respect to the usual lex order, is given by 

91 = 93 Tio + 93P11 - P10 

92 = Poo + P01 + P10 + P11 - 1 

93 = 9291 - 9i - 92 - P10 - P11 + 1 

94 = 29391 - 91 - 92 + 29293 - 39293 - 393 + P01 + 2pi 0 - Pu + 1 

95 = 29392 - 4 2 - 34342 + P0142 + 2pio42 - P1142 + 42 + 43 - P01 - P10 

9e = 2pi 0 + 91P10 + 92P10 +P10P01 +P11P10 - 2pio - Pn - 91P11 - 92P11 +P01P11 +P11 
97 = Pio 42 - Ph 4 2 + 2 Pio92 - Pn42 + P01P1042 - 2pi 0 42 

+ P01P1192 +P10P1142 +Pn42 - Pm P01P10 +P10 P01P11 P10P11 • 

The equation g 2 = 0 is just the usual normalisation condition restricting the joint probability distribution to the 
tetrahedron.In order to use the extension theorem to extend a partial solution {poo,Poi,Pio,Pn} to a full solution 
{QiiQ2, QsiPooiPohPiOiPii}, we must ensure that each solution is real. The only situations in which we need to im¬ 
pose this is in the case of . The equation gj = 0 is a quadratic in and in order for its solutions to be real, we must 
stipulate that 


4 (pio -PnXPooPio -P01P11) < (pn( 2 poi + 2 pi 0 +P00) — Pio( 2 poo + 2 p n +Poi)) 2 - 

Using gi = 0 to write q% in terms of pio and pn and substituting this into # 5=0 gives us another quadratic in # 2 - For 
the solutions of this quadratic to be real we must enforce that 

4 (pio — Pu)(pooPio ~ P01P11) > — (pn( 2 poi + 2 pio + Poo) — Pio( 2 poo + ^Pii +P01)) • 

Combining these two inequalities we get 

|4(pio -pn)(pooPio -PoiPii)| < (pn(2poi + 2pio +poo) — Pio(2poo + 2 p u +Poi)) 2 , 

where |.| denotes the absolute value. None of the remaining constraints 0 < qi < 1, for i = 1, 2,3 result in non-trivial 
relations among the p^ ’s. 

Examining this inequality more closely, we see that setting pio = 0 reduces it to the inequality defining the StarFleet 
insignia, 4p 0 i < (2poi +Poo) 2 > on the face spanned by {[00], [01], [11]}, as it should (this is visible in Fig.^b)). Similarly, 
for pu = 0 we get the StarFleet insignia on the face {[00], [01], [10]} (also visible in Fig.JIJb)). The appearance of the 







term pooPio — P01P11 is also noteworthy. Recall that the equation pooPio = P01P11 defines the fan depicted in Fig. 2 (b) 
in the main text, so the above inequality quantitatively bounds the deviation from the surface of this fan by an amount 
proportional to the two star trek symbols discussed above. This is intuitively what we would expect from looking at the 
semi-algebraic set depicted in Fig.[8jb). 

These examples cover all the different situations one may encounter while using algebraic geometry techniques to derive 
tests for feasibility of the causal models we are considering in this work. The remaining tests are derived in an analogous 
fashion. 

C Sufficiency of n- latent-bit models 

We here present the proof of Theorem [ 4 . 2.1 

The example presented in section | 4 . 2 | suggests a general procedure for replacing an ra-valued latent variable with some 
finite number of binary latent variables. Replace the m-valued variable with a number of substitute variables—the ana¬ 
logues of 7 and 77 above, but which now can take an arbitrary number of values—such that any distribution over the 
m-valued variable can be simulated using a fc-latent-bit causal model—the analogue of the causal model containing p and 
v above—underlying the substitute variables. By eliminating the intermediary variables, the dependence of the observed 
variables on the m-valued latent variable is replaced with a dependence on k binary latent variables. 

We now describe a procedure for replacing an m-valued variable, for any m, by two variables 7 and 77 in such a way that 
any distribution over the m-valued variable is obtained by some /c-latent-bit causal model underlying 7 and 77. 

Recall that for a 3 -valued variable, we can take 7 and 77 to be bits and use the fiducial model from class ( 2 , 1 , c)id, whose 
distribution is the convex combination of an edge of the tetrahedron and a vertex not contained in that edge. Similarly, for 
a 4 -valued variable, we can take 7 and 77 to be bits and use the fiducial model from class ( 3 , 2 , g)ia, whose distribution is 
the convex combination of a face and vertex not contained in that face. 

For a 5-valued variable, we can take 7 to be a trit and 77 to be a bit. For any causal model underlying 7 and 77, the semi- 
algebraic set generated by this model is now a subset of a simplex with six vertices, [777] G { [00], [01], [10], [11], [20], [21] } 
We now construct a causal model underlying 7 and 77 by combining two simpler models, using the procedure described 
in section 4 in the main text: the first model is one whose semi-algebraic set is the tetrahedron (considered as the subset 
of the six-simplex having [777] G {[00], [01], [10], [11]}) and the second is one whose semi-algebraic set is a vertex of 
the six-simplex not contained in the tetrahedron. A binary switch variable toggles between these two simpler models. 
Given the geometry, the semi-algebraic set defined by the model is clearly the convex combination of the tetrahedron 
and the vertex outside the tetrahedron. In particular, we can take the first model to be the fiducial model from class 
(3, 3)id (where 7 is replaced by a trit but its dependence on its causal parents is unchanged) and the second model to be a 
deterministic model that sets 7 = 2 and 77 = 0. Denoting the switch variable by p, and the other latent bits by p, is, S (as 
in the row containing class (3, 3)id), we obtain the following functional dependences by the switch-variable construction: 
7 = p(pis ®2 1) ®3 2 (p ® 2 1) and 77 = p(p is 5 ®2 is). One easily verified that if p = 1, one recovers the fiducial model 
of class (3, 3)id and hence the tetrahedron spanned by [777] G {[00], [01], [10], [11]}, while if p = 0, one obtains the point 
[777] = [20]. 

By increasing the number of values that 7 and 77 can take, one can ensure that the number of vertices in the space of 
distributions over 7 and 77 is at least m , such that one can simulate an 777 -valued latent variable by finding a causal model 
underlying 7 and 77 whose semi-algebraic set is an 777-simplex. To construct such a model, we apply the switch-variable 
construction to a pair of simpler models, one of which has an semi-algebraic set corresponding to an (m — 1)-simplex, 
and the other of which is a deterministic model corresponding to a vertex outside of this (m — 1 )-simplex. In this way, we 
can recursively build up a causal model involving only binary latent variables whose semi-algebraic set is an 777 -simplex 
for any m. 





